Comparing Genetic Algorithms Computational Performance Improvement Techniques
نویسنده
چکیده
A comparison of three methods for saving previously calculated fitness values across generations of a genetic algorithm is made. These methods lead to significant computational performance improvements. For real world problems, the computational effort spent on evaluating the fitness function far exceeds that of the genetic operators. As the population evolves, diversity usually diminishes. This causes the same chromosomes to be frequently reevaluated. By using appropriate data structures to store the evaluated fitness values of chromosomes, significant performance improvements are realized. Several different data structures are compared and contrasted. This paper demonstrates performance improvements for different sets of genetic algorithm parameters, including selection type, population size, and level of mutation. Although genetic algorithms (GAs) are robust global optimizers (Goldberg 1989; Holland 1992), they are slower to converge than gradient-based methods (Povinelli and Feng 1999b). Hashing provides an effective method for improving a GA’s computational performance (Povinelli and Feng 1999b). In this paper, the study of new techniques to improve GA performance is further investigated. Three methods are evaluated for their effectiveness in improving GA computational performance. The first, for comparison purposes, is the hashing technique previously introduced in (Povinelli and Feng 1999b). The second method saves the current generation’s fitness values for use by the following generation. The third uses a binary tree to store previously calculated fitness values. The paper is broken into four sections. The first section presents the problem statement, discusses the optimization problem, and details the testing platform. The second section discusses each of the three potential solutions. The third section presents the results and discusses their significance. The final section draws several conclusions and briefly discusses future research. PROBLEM STATEMENT This research originated with profiling of the computation time of a GA. For complex, real world problems, most time is spent evaluating the fitness function (Povinelli and Feng 1999b). By studying the convergence criteria and the diversity characteristics of an evolving GA, it is observed that fitness values are frequently recalculated. This suggests an opportunity for performance improvement. By efficiently storing fitness values with any of the three methods, GA performance can be dramatically improved. For the test problem, hashing provides the best performance improvement, then the keep last generation algorithm, and finally the binary tree technique. Previously it was shown that hashing can provide a computational performance improvement of more than 50% on a complex real world problem (Povinelli and Feng 1999b). Feedback from the presentation of these results was the impetus to explore alternative methods for storing previously calculated fitness values. The first method is the hashing algorithm originally presented in (Povinelli and Feng 1999b). Assuming the size of the hash table is initialized appropriately, the computational cost of hashing is constant. However, as the search space is explored, the cost degrades to O(n) for both insertion and retrieval (Manber 1989, p. 80). The hashing technique used here avoids the O(n) cost by reinitializing the hash table as its performance degrades. The second method investigated is a keep last generation algorithm, which stores the previous generation’s fitness values in a hash table, which is reinitialized after each generation. The third method is a binary tree algorithm, which has an average insertion and retrieval cost of O(log n) and a worst case cost of O(n) (Manber 1989, p. 73). On average, the cost of insertion and retrieval grows logarithmically as the search space is explored. Because of the theoretical comparative computational effort of the three techniques and the amount of previously calculated fitness value storage, it is assumed that hashing provides the greatest benefit. This is experimentally confirmed. But surprisingly, the binary tree algorithm delivers the poorest performance of the three methods. The test problem for this paper is an application of Time Series Data Mining (TSDM) (Povinelli 1999; Povinelli and Feng 1998; Povinelli and Feng 1999a) to identifying events in a financial time series, where an event is an important occurrence. The TSDM technique characterizes and predicts such events in time series by adapting data mining concepts for analyzing time series. Based soundly in dynamical systems theory (Takens 1980), the TSDM method reveals hidden temporal patterns in time series data by taking advantage of the event nature of many problems. The search mechanism at the heart of the TSDM method is a GA. A simple GA composed of the standard selection, crossover, and mutation operators is used. The GA uses a binary chromosome of length 18, random locus crossover, and single individual elitism. The stopping criterion for the GA is convergence of all fitness values. Both tournament and roulette selection are investigated. The tournament selection uses a tournament of size two. The benchmarks were performed with MATLAB 5.3.1 running under Windows NT 4.0 Service Pack 5. The computation time was obtained with MATLAB’s profiling tool, which reports a precision of .016s. The hardware environment was a dual Pentium III 450MHz with 256MB 100MHz SDRAM, 18GB ultra-IDE hard drive, and a 32MB AGP video card. Although the hardware contains two processors, MATLAB runs on only one processor. POTENTIAL SOLUTIONS The key to improving the performance of the GA is to reduce the time needed to calculate the fitness. By examining the mechanisms of the GA, it is seen that the diversity of the population decreases as the algorithm runs. The fitness values for the same chromosomes are recalculated repeatedly. If previously calculated fitness values can be efficiently saved, computation time will diminish significantly. The data mining problem used in this paper searches for temporal patterns in a time series. To find a temporal pattern of length two requires chromosomes of length 18. This means that the search space contains 2 or 267,144 members. With this number of members, the fitness values could be stored in an array. Although this is not an unmanageable size, the problem quickly becomes unwieldy for a slightly larger datamining problem. For example, a search for temporal patterns of length four requires a chromosome of length 30. This yields a search space with more than one trillion members. With current technology, it is not feasible to store a 10 size array efficiently. This leads us to consider alternative methods for storing the fitness values. Hashing Algorithm The classic data structure for efficient storage and retrieval is the hash table (Manber 1989, pp.78-79). The interface to a hash table provides two methods. The first is put, which takes two parameters – the key and an element. The put method stores the element with the associated key. The second method is the get. It takes one parameter, the key, and returns two values – a flag indicating if an element was found and the element. Internally, the key-element pairs are stored in an array. The array is accessed through a hash, which is based on the key. Table 1 shows how the data structure is formed. Table 1 Sample Hash Table Extract Hash Key Element 10
منابع مشابه
Pareto Optimization of Two-element Wing Models with Morphing Flap Using Computational Fluid Dynamics, Grouped Method of Data handling Artificial Neural Networks and Genetic Algorithms
A multi-objective optimization (MOO) of two-element wing models with morphing flap by using computational fluid dynamics (CFD) techniques, artificial neural networks (ANN), and non-dominated sorting genetic algorithms (NSGA II), is performed in this paper. At first, the domain is solved numerically in various two-element wing models with morphing flap using CFD techniques and lift (L) and drag ...
متن کاملMulti-objective optimization of nanofluid flow in microchannel heat sinks with triangular ribs using CFD and genetic algorithms
Abstract In this paper, multi-objective optimization (MOO) of Al2O3-water nanofluid flow in microchannel heat sinks (MCHS) with triangular ribs is performed using Computational Fluid Dynamics (CFD) techniques and Non-dominated Sorting Genetic Algorithms (NSGA II). At first, nanofluid flow is solved numerically in various MCHS with triangular ribs using CFD techniques. Finally, the CFD data will...
متن کاملSolving a Stochastic Cellular Manufacturing Model by Using Genetic Algorithms
This paper presents a mathematical model for designing cellular manufacturing systems (CMSs) solved by genetic algorithms. This model assumes a dynamic production, a stochastic demand, routing flexibility, and machine flexibility. CMS is an application of group technology (GT) for clustering parts and machines by means of their operational and / or apparent form similarity in different aspects ...
متن کاملIncreasing the Performance of OFDM Systems by PAPR Reduction in PTS Technique using Election Optimization Algorithm
Orthogonal Frequency Division Multiplexing (OFDM) is a useful technology in wireless communications that provides high-rate data transmission in multipath fading channels. The advantages of OFDM systems are the high spectral efficiency and strong resistance to frequency selective fading. In OFDM systems, a large number of sub-carriers are used to modulate the symbols causing the time-domain OFD...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملGenetic Algorithms for Single Machine Job Scheduling with Common Due Date and Symmetric Penalties
A single machine n-job scheduling problem is examined to minimize sum of absolute deviations of completion times from a common due date. Simple and hybrid genetic Algorithms are developed by investigating basic operators for the applications of job sequencing problems. For the simple genetic algorithm two heuristic crossover schemes: Algorithm VASX and Algorithm VADX are developed based on impo...
متن کامل